Using the Equivalent Kernel to Understand Gaussian Process Regression

نویسندگان

  • Peter Sollich
  • Christopher K. I. Williams
چکیده

The equivalent kernel [1] is a way of understanding how Gaussian process regression works for large sample sizes based on a continuum limit. In this paper we show (1) how to approximate the equivalent kernel of the widely-used squared exponential (or Gaussian) kernel and related kernels, and (2) how analysis using the equivalent kernel helps to understand the learning curves for Gaussian processes. Consider the supervised regression problem for a dataset D with entries (xi, yi) for i = 1, . . . , n. Under Gaussian Process (GP) assumptions the predictive mean at a test point x∗ is given by f̄(x∗) = k (x∗)(K + σ I)y, (1) where K denotes the n × n matrix of covariances between the training points with entries k(xi,xj), k(x∗) is the vector of covariances k(xi,x∗), σ is the noise variance on the observations and y is a n × 1 vector holding the training targets. See e.g. [2] for further details. We can define a vector of functions h(x∗) = (K + σI)k(x∗) . Thus we have f̄(x∗) = h(x∗)y, making it clear that the mean prediction at a point x∗ is a linear combination of the target values y. Gaussian process regression is thus a linear smoother, see [3, section 2.8] for further details. For a fixed test point x∗, h(x∗) gives the vector of weights applied to targets y. Silverman [1] called h(x∗) the weight function. Understanding the form of the weight function is made complicated by the matrix inversion of K + σI and the fact that K depends on the specific locations of the n datapoints. Idealizing the situation one can consider the observations to be “smeared out” in x-space at some constant density of observations. In this case analytic tools can be brought to bear on the problem, as shown below. By analogy to kernel smoothing Silverman [1] called the idealized weight function the equivalent kernel (EK). The structure of the remainder of the paper is as follows: In section 1 we describe how to derive the equivalent kernel in Fourier space. Section 2 derives approximations for the EK for the squared exponential and other kernels. In section 3 we show how use the EK approach to estimate learning curves for GP regression, and compare GP regression to kernel regression using the EK. 1 Gaussian Process Regression and the Equivalent Kernel It is well known (see e.g. [4]) that the posterior mean for GP regression can be obtained as the function which minimizes the functional J [f ] = 1 2 ‖f‖H + 1 2σ2 n n

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Understanding Gaussian Process Regression Using the Equivalent Kernel

The equivalent kernel [1] is a way of understanding how Gaussian process regression works for large sample sizes based on a continuum limit. In this paper we show how to approximate the equivalent kernel of the widely-used squared exponential (or Gaussian) kernel and related kernels. This is easiest for uniform input densities, but we also discuss the generalization to the non-uniform case. We ...

متن کامل

Determining Effective Factors on Land Surface Temperature of Tehran Using LANDSAT Images And Integrating Geographically Weighted Regression With Genetic Algorithm

Due to urbanization and changes in the urban thermal environment and since the land surface temperature (LST) in urban areas are a few degrees higher than in surrounding non-urbanized areas, identifying spatial factors affecting on LST in urban areas is very important. Hence, by identifying these factors, preventing this phenomenon become possible using general education, inserting rules and al...

متن کامل

Multiple Gaussian Process Models

We consider a Gaussian process formulation of the multiple kernel learning problem. The goal is to select the convex combination of kernel matrices that best explains the data and by doing so improve the generalisation on unseen data. Sparsity in the kernel weights is obtained by adopting a hierarchical Bayesian approach: Gaussian process priors are imposed over the latent functions and general...

متن کامل

Machine Learning Approaches on a Travel Time Prediction Problem

This thesis concerns the prediction of travel times between two points on a map, based on a combination of link-scale road network data and historical trip-scale data. The main idea is that the predictions using the road network data can be improved by a correction factor estimated from historical trip data. The correction factor is estimated both using a Machine Learning approach, more specifi...

متن کامل

Predicting the Young\'s Modulus and Uniaxial Compressive Strength of a typical limestone using the Principal Component Regression and Particle Swarm Optimization

In geotechnical engineering, rock mechanics and engineering geology, depending on the project design, uniaxial strength and static Youngchr('39')s modulus of rocks are of vital importance. The direct determination of the aforementioned parameters in the laboratory, however, requires intact and high-quality cores and preparation of their specimens have some limitations. Moreover, performing thes...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004